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ABSTRACT 



In an age of student accountability, public school systems 
must find procedures for identifying effective schools, classrooms, and 
teachers that help students continue to learn academically. As a result, 
researchers have been modeling schools and classrooms to calculate 
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political criticisms. There are two main approaches to this problem; one uses 
pretest and posttest scores of students and the other uses student gain 
scores. In this paper, a model is developed for each approach, and their 
results are discussed. The outcomes of each model are School Effective 
Indices (SEI) and Classroom Effective Indices. A set of criteria is 
established and the SEIs these models produce are tested against these 
criteria using hierarchical linear modeling (HLM) to determine the "better” 
approach. It is suggested that both models produce very similar results. If 
HLM software from SSI Inc. is used, then the pretest-posttest two-level model 
is more convenient and efficient to use than the three-level Gain model. 
(Contains four tables and six references.) (Author/SLD) 
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In an age of student accountability, public school systems must find procedures for 
identifying effective schools, classrooms and teachers that help students continue to 
learn academically. As a result researchers have been modeling schools and classrooms 
to calculate productivity indicators that will withstand not only statistical review but 
political criticisms. There are two main approaches to this problem; one using Pre-test 
and Post-test scores of students while the other uses student gain scores. In this paper 
a model is developed for each approach and their results are discussed. The outcomes 
of each model are School Effective Indices(SEI) and Classroom Effective Indices(CEI). 
A set of critera is established and the SEI’s the two models are tested against these 
critera to determine the “better” approach. 



Introduction 



The strategy for predicting performance from a pre-test score with adjustments for co- 
variates is becoming the standard for producing productivity indicators. Currently the most 
asked question is: should we be modeling the student’s post-test score or student growth, 
i.e., gain. 

To model student growth or student achievement within a year, the standard statistical 
approach is to model an outcome variable measuring student’s achievement or growth in a 

* Paper presented at the annual meeting of the Soutwest Educational Research Association, Austin, TX, 
January, 1997. 
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regression model as follows: If ITBS96y is the z th student’s score on ITBS Reading in school 
j for year 1995/96, then 

ITBS96y — /5o + fij (1) 

where r y are the student residuals which are AT(0, a). 

In this model, ry measures the distance a student’s actual score varies from the regression 
line. If ry > 0, the student performed higher than average and if ry < 0, the student 
performed lower than average. By aggregating the ry by each j, we can calculate a measure 
of how well school j performed. The question of interest is what does ry measure. We 
are interested in measuring student achievement in the year 1995/96, but in this particular 
model ry is not a measure of student growth, but current standing. This can be easily 
observed by the bias of ry with respect to student’s previous test scores. 

The next progression in this model is to remove any biases in ry with respect to students’ 
prior test scores, namely ITBS Reading scores from school year 1994/95. Hence equation 1 
is expanded as follows: 

ITBS96y =/3 0 + /3iITBS95y + ry (2) 

Now, ry in the model is conditioned on student’s pre-test score and can be said to be unbiased 
with respect to the students 1994/95 ITBS reading score. Now, ry is a measure of student 
growth in year 19945/96. 

The next major question arises at this stage: what is the outcome variable that should 
be modeled? Should it be the student’s ITBS Reading score for 1995/96 or should it be the 
gain in the student’s ITBS Reading score from 1994/95 to 1995/96? To answer this question, 
two types of models will be analyzed, the one used by a large urban school district and a 
complex gain model recently developed (Bryk and Thum, April 1996). The Bryk and Thum 
gain model is as follows: 



GAIN959% = p 0 + /?iITBS95 i:) - + ry (3) 

For this study, the measure GAIN9596 will be calculated as follows: 

GAIN9596y = ITBS96GEy - ITBS95GEy (4) 

In this study data from 5197 sixth grade students from 118 elementary schools were modeled, 
with no missing data. 



Methodology 



Hierarchical Linear Modeling will be used to compare the Post-Test/Pre-Test Model and 
the Gain Model. This comparison will be carried out for four models of varying complexities. 
The school rankings obtained from the two types of models will be compared to each other 
to observe any differences. 

The four models are as follows: 
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The basic models with no student level variables and no school level variables: 



PP-1 Model: 



READ 96^ — Poj T 3 \ j R E A D 9 5 ij T v r j 



( 5 ) 





3oj — 7oo + u oj 


(6) 




3ij = 7io 


(7) 


Gain-1 Model: 




READGAIN9596ijfc = n 0 jk + 7rijfcREAD95yfc + 


(8) 




KOjk = PoOk + r 0jk 


(9) 




TTl jk = 3l0k + r ljk 


(10) 




3oOk = 7000 + Woo k 


(11) 




3l0k = 7ioo 


(12) 


The basic models with no student level variables and one school level variable, percentage 


of minority students in a school, PMINORITY: 




PP-2 Model: 




READ96ij — fioj -K PijREAD95{j Tij 


(13) 




Poj = 7oo + 7oiP MINORITY^ + Uoj 


(14) 




/^ijf = 7io 


(15) 


Gain-2 Model: 




READGAIN9596yfc — 7Tojfc ■+■ 7rijfcREAD95yfc + 


(16) 




TTOjfc = 3oOk + r 0jk 


(17) 




TTljfc = PlOk+1'ljk 


(18) 




Poo k = 7ooo + 7 oifcPMINORITYfc + Moot 


(19) 




PlOk = 7ioo 


(20) 
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Models with student level variables (GENDER, BLACK and HISPANIC) and no school 
level variables: 

PP-3 Model: 

READ96y = p oj + /?yREAD95y + AySEXy + AyBLACKy + /JyHISPANICy 



tto jk — Poo k + ^oifcSEXjfc + /? 02 fcBLACKjfc + Am HISPANIC^ + (25) 

tti jk = Pwk + AifcSEXjfc + ^i2fcBLACKjfc + /JmHISPANICjfc + ri jfc (26) 



for p = 0, 1, q = 1, 2 , 3 , and p = q ^ 0. 

Models with student level variables (GENDER, BLACK and HISPANIC) and one school 
level variables, percentage of minority students in a school, PMINORITY: 

PP-4 Model: 

READ96y = p oj + AjREAD95y + AySEXy + AyBLACKy + AyHISPANICy 
+#ySEXy • READ95y + /3 6j BLACKy ■ READ95y 



+^ 5j SEXy ■ READ95y + /3 6j BLACKy ■ READ95y 
+/? 7j HISPANICy ■ READ95y + ry 



( 21 ) 



Poj — 7oo + u 0 j 
Pqj = 'YqO 



( 22 ) 

(23) 



for q = 1, 2, . . . , 7. 

Gain-3 Model: 



READGAIN9596yfc — 7 Tq jk + 7ryfcREAD95yfc + ey^ 



(24) 




(27) 

(28) 



+/? 7 , HISPANIC,, ■ READ95,, + r (j 



(29) 



Poj = Too + Toi PMINORITY j + u^j 

Pqj — 'JqO 



(30) 

(31) 



for q = 1, 2, . . . , 7. 
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Gain-4 Model: 



READGAIN9596yjfc — T^ojk + 7ri_ ? fcREAD95ijfc 4 - (32) 

ftojk = Pook + /^oifcSEXjjt + /?02fcBLACKjjt + /?03JfcHISPANICjfc + r 0 jk (33) 

Kijk — Piok + AifcSEXjjfc + y5i2jfcBLACKjjt + /3i 3 *;HISPANICjfc + r\j k (34) 

Pook = 7ooo + TooiPMINORITYjt + uoofc (35) 

Ppqk — 7ioo (36) 

for p = 0, 1 , q = 1 , 2 , 3 , and p = q ^ 0. 

For models PP-1, PP-2, PP-3 and PP-4: 

i— 1,2,..., 5197 and j = 1,2,..., 118. 



Ranking of schools are obtained from Uoj which measures the deviation of the individual 
school’s intercept from the overall intercept of the schools. This estimate is shrinkage ad- 
justed by the number of repeated observations used to calculate this measure. 

For models GAIN-1, GAIN-2, GAIN-3 and GAIN-4: 

* = 1,2,..., 5197, ; = 1,2,..., 5197 and k = 1, 2, ... , 118. 

Ranking of schools are obtained from uoojt which measures the deviation of the individual 
school’s intercept from the overall intercept of the schools. This estimate is also shrinkage 
adjusted by the number of repeated observations used to calculate this measure. 

Classroom Effective Indices are obtained from the student residuals, namely ry, for the 
PP models and r^jt, for the Gain Models, which measure the deviation of the students score 
from it’s predicted value. 



Results 



School Effective Indices 

Under the model assumptions, the u o p s for the 2-level models and u 00 kS for the 3-level 
models are normally distributed with a mean of zero. This fact is necessary if the school 
rankings obtained from the models are used to rank schools, and specially if the top schools 
are given performance awards. The need may arise to calculate confidence intervals and 
the normality assumtions will be vital. The descriptive statistics for the measures of School 
Effective Indices are as follow: 
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Model 


Measure 


Mean 


St. Dev. 


Kurtosis 


Skewness 


K-S P- Value 


PP-1 


Uoj 


0.0000017 


0.866 


0.148 


0.170 


0.9126 


Gain-1 


u ook 


0.0052119 


4.008 


0.156 


0.207 


0.9546 


PP-2 


u Oj 


-0.000002 


0.703 


0.907 


0.442 


0.9729 


Gain-2 


u ook 


0.0001864 


3.135 


0.649 


0.454 


0.5278 


PP-3 


u Oj 


-0.000003 


0.788 


0.404 


0.273 


0.8197 


Gain-3 


u ook 


0.0018516 


3.679 


0.311 


0.288 


0.7969 


PP-4 


u Oj 


- 0.000000 


0.696 


0.102 


0.534 


0.9676 


Gain-4 


u ook 


0.0009754 


3.180 


0.828 


0.522 


0.8752 



Kolmogorov-Smirnov Goodness of fit-test was carried out to determine if these measure 
are Normally distributed. The table above indicates K-S P-Values which are extremely high, 
hence failing to reject the Null hypothesis that the values are from a normal distribution. The 
values of Kurtosis for the models PP-2, Gain-2 and Gain-4 may suggest that the distribution 
is not normal, but the Kolmogorov-Smirnov Test proves otherwise. 

The table below gives the correlations of the ranking measures for each model with 
one another and two school level variables MOBILITY(MOB) and PMINORITY(PMIN). 
PMINORITY is included in the models 2 and 4 to demonstrate how efficiently these models 
remove school level biases, and MOBILITY is included as a control. 





PP-1 


G-l 


PP-2 


G-2 


PP-3 


G-3 


PP-4 


G-4 


PMIN 


MOB 


PP-1 


1.000 




















G-l 


0.988 


1.000 


















PP-2 


0.885 


0.869 


1.000 
















G-2 


0.874 


0.886 


0.982 


1.000 














PP-3 


0.987 


0.974 


0.925 


0.912 


1.000 












G-3 


0.978 


0.989 


0.906 


0.922 


0.987 


1.000 










PP-4 


0.882 


0.865 


0.996 


0.979 


0.932 


0.912 


1.000 








G-4 


0.881 


0.891 


0.982 


0.996 


0.927 


0.935 


0.985 


1.000 






PMIN 


-0.459 


-0.465 


0.000 


-0.012 


-0.357 


-0.374 


- 0.000 


-0.028 


1.000 




MOB 


-0.215 


0.202 


-0.137 


-0.112 


-0.201 


-0.188 


-0.139 


-0.128 


0.246 


1.000 



From the table above we can observe that the School Rankings obtained from all the 
models are highly correlated with each other. The correlation range in value from 0.9819 to 
0.9883. The correlations in the full model, with both student level and school level variables 
is 0.9847. Thus we can conclude that each pair of models, for the four different models 
considered, produces school rankings that are not significantly different from each other. 

The last two columns on the table have two school level variables. Of these, PMINORITY 
was included in the Models PP-2, G-2 and PP-4, G-4. Comparing the correlations, we can 
conclude that any bias introduced by PMINORITY into the school ranking is removed by 
including this variable into the model. The Pre-test/Post-test models did a better task of 
removing the bias than the Gain model since the later models have higher correlations than 
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the former models. The school level variable MOBILITY was not included in any of the 
models and, as can be seen in the table above, it’s bias remains in the school rankings. 

Classroom Effective Indices 

The process of obtaining Classroom Effective Indices from these models is more compli- 
cated than School Effective Indices. All the models above produce regression lines which are 
school specific, i.e. there is a regression line for each school. If we estimate r\j for the PP 
models and for the GAIN models using these lines we obtain student residuals within 
a specific school. Hence we calculate an average regression line(District Line) and calculate 
the student residuals from this line with respect to each model as follows. 

For the Pre-' Test /Post-Test models: 

Tij = READ96jj — (700 + 7ioREAD95y + 72oSEXy + • • •) (37) 

For the GAIN models: 



r ijk — READ96jjfc — (7000 + 7iooREAD95ijfc) 



(38) 



The following table illustrates the correlations of the student residuals obtained from the 
eight models. 





PP-1 


G-l 


PP-2 


G-2 


PP-3 


G-3 


PP-4 


G-4 


pp-l 


1.000 
















G-l 


0.957 


1.000 














PP-2 


1.000 


0.957 


1.000 












G-2 


0.957 


1.000 


0.957 


1.000 










PP-3 


0.997 


0.955 


0.996 


0.955 


1.000 








G-3 


0.957 


1.000 


0.957 


1.000 


0.955 


1.000 






PP-4 


0.998 


0.056 


0.998 


0.956 


1.000 


0.956 


1.000 




G-4 


0.957 


1.000 


0.957 


1.000 


0.955 


1.000 


0.956 


1.000 



As can be seen from the above table the Pre- Test/Post-Test Models(PP) and the Gain 
Models(G) are also highly correlated, with the lowest correlation having a p = 0.955. We 
can conclude from the above that the two types of models calculate district-wide residuals 
with significantly the same distribution. 

There is major draw back of calculating district wide residuals from a multi-level model 
with schools as a level. Within a school, the student residuals, i.e., for j fixed has 
a normal distribution with a mean of zero. This is not the same for calculating student 
residuals district wide. Even if the model is not a multi-level model, but a linear regression 
model, the student residuals will not be normal. This fact should be considered when these 
residuals are analyzed and utilized. 

Though the residuals of the various pairs of models are significantly correlated, this may 
not be evident from the descriptive statistics below. 
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Model 


Mean 


Median 


St. Dev. 


Kurtosis 


Skewness 


K-S P- Value 


PP-1 


-0.029 


0.102 


3.848 


-0.171 


-0.130 


0.0026 


Gain-1 


-0.181 


-1.857 


19.43 


0.537 


0.372 


0.0000 


PP-2 


-0.030 


0.105 


3.850 


-0.181 


-0.129 


0.0015 


Gain-2 


-0.047 


-1.619 


19.4434 


0.536 


0.377 


0.0000 


PP-3 


-0.017 


0.081 


3.826 


-0.15 


-0.120 


0.0310 


Gain-3 


-0.087 


-1.615 


19.43 


0.535 


0.379 


0.0000 


PP-4 


-0.021 


0.116 


3.830 


-0.165 


-0.123 


0.0132 


Gain-4 


-0.130 


-1.543 


19.44 


0.534 


0.383 


0.0000 



Kolmogorov-Smirnov Goodness of fit-test was carried out to determine if these measure 
are Normally distributed. The table above indicates K-S P- Values which are extremely low, 
hence rejecting the null hypothesis that the values are from a normal distribution. The values 
of Kurtosis and Skewness for the models PP is significantly lower than that for the GAIN 
models. The medians for the PP models are closer to the mean than that of the GAIN 
models. The fact that the means of the residuals are close to zero is an encouragement. 
These analysis were carried out with 5197 student residuals. 

By aggregating these student residuals by each classroom of the student, Classroom 
Effective Indices can be obtained. 



Summary 

Should we model Post- Test scores or Gains with Pre-Test scores as predictors in calcu- 
lating student growth and school efficacy? This paper arrives at the conclusion that both 
types of models produce very similar results. Since the two types of models are compa- 
rable, the question arises which one should be used. If HLM software from SSI Inc.’s is 
used, then Pre-Test/Post-Test : two level models are more convenient and efficient than the 
Gain : three level models. In the two level models, more level-1 and level-2 variables can 
be introduced with proper centering to obtain complex models without any biases to the 
conditioning variables. This is not the same for 3-level HLM models. The models are very 
sensitive to multicollinearity and low variances in conditioning variables. As a result the 
number of level-2 (student level characteristics) and level-3 (school level) variables that can 
be introduced into the model is limited. In the models analyzed in this paper, the school 
level variable MOBILITY could not be introduced in the 3-level models due to software 
limitations, though MOBILITY could easily be entered into the 2-level models. Further, 
the iterative process of obtaining solutions to these 3-level hierarchical models using the EM 
algorithm has difficulties in calculating starting values to begin the iterative process. These 
starting values have to be cleverly estimated and placed into the model to start the iterative 
process. 

All these difficulties with the 3-level models without any substantially different results 
that are obtained leads to the conclusion that the Pre-Test/Post-Test 2-level model is pre- 
ferred over the 3-level Gain model. 
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